Coalescence Type based Confidence Warping for Agglutinative Language Keyword Spotting
نویسندگان
چکیده
In agglutinative languages like Korean, words are formed by joining l affix morphemes to the stem, which leads to high OOV rate in dictionary building. Hence, subword units are usually used as basic language modeling units in Large-Vocabulary Continuous Speech Recognition (LVCSR) or LVCSR based applications such as keyword spotting. In this work, firstly a new word property called coalescence type is introduced, which is defined based on the result of word segmentation process and thus unique for agglutinative languages. A confidence warping approach is then proposed to adjust confidence measure for keyword candidates, with the additional linguistic level information. An evaluation on Korean telephone speech keyword spotting task shows that up to 2% improvement can be obtained in precision, which is significantly better than the baseline system.
منابع مشابه
Keyword spotting for highly inflectional languages
This paper presents our new keyword spotting system taking advantage of both the filler model and the confidence measure approaches. The novelty is in a non-standard connection of the filler and the keyword models together with introduction of a new confidence measure based on a keyword normalized score. In detail the paper deals with a decision block. Two methods are introduced. The first is b...
متن کاملUsing phonological phrase segmentation to improve automatic keyword spotting for the highly agglutinating Hungarian language
This paper investigates the usage of prosody for the improvement of keyword spotting, focusing on the highly agglutinating Hungarian language, where keyword spotting cannot be effectively performed using LVCSR, as such systems are either unavailable or hard to operate due to high OOV rates and poor Ngram language modelling capabilities. Therefore, the applied keyword spotting system is based on...
متن کاملA Piecewise Aggregate Approximation Lower-Bound Estimate for Posteriorgram-Based Dynamic Time Warping
In this paper, we propose a novel lower-bound estimate for dynamic time warping (DTW) methods that use an inner product distance on multi-dimensional posterior probability vectors known as posteriorgrams. Compared to our previous work, the new lower-bound estimate uses piecewise aggregate approximation (PAA) to reduce the time required for calculating the lower-bound estimate. We describe the P...
متن کاملSpanish Keyword Spotting System Based on Filler Models, Pseudo N-gram Language Model and a Confidence Measure
In order to organize efficiently lots of hours of audio contents such as meetings, radio news, search for spoken keywords is essential. An approach uses filler models to account for non-keyword intervals. Another approach uses a large vocabulary continuous speech recognition system (LVCSR) which retrieves a word string and then search for the keywords in this string. This approach yields high p...
متن کاملKeyword Spotting with Convolutional Deep Belief Networks and Dynamic Time Warping
To spot keywords on handwritten documents, we present a hybrid keyword spotting system, based on features extracted with Convolutional Deep Belief Networks and using Dynamic Time Warping for word scoring. Features are learned from word images, in an unsupervised manner, using a sliding window to extract horizontal patches. For two single writer historical data sets, it is shown that the propose...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- JSW
دوره 9 شماره
صفحات -
تاریخ انتشار 2014